ggplot2```{r} blocksRepeated measures for 11 individuals, mean (sd)
| Round | Duration | Number Correct |
|---|---|---|
| 1 | 7.5 | 9.0 |
| (2.0) | (3.3) | |
| 2 | 7.5 | 9.0 |
| (2.0) | (3.3) | |
| 3 | 7.5 | 9.0 |
| (2.0) | (3.3) | |
| 4 | 7.5 | 9.0 |
| (2.0) | (3.3) |
Regression of Duration on Number Correct repeated for each round
| Round | Term | Estimate | SE |
|---|---|---|---|
| 1 | (Intercept) | 3.0 | 1.12 |
| num.correct | 0.5 | 0.12 | |
| 2 | (Intercept) | 3.0 | 1.13 |
| num.correct | 0.5 | 0.12 | |
| 3 | (Intercept) | 3.0 | 1.12 |
| num.correct | 0.5 | 0.12 | |
| 4 | (Intercept) | 3.0 | 1.12 |
| num.correct | 0.5 | 0.12 |
ALWAYS. LOOK. AT. THE. DATA.
ggplot2
A visual display that illustrates one or more relationships among numbers…a shorthand means of presenting information that would take many more words and numbers to describe.
—Stephen M. Kosslyn. Graph Design for the Eye and Mind. Oxford University Press, 2006
It depends on the goal:
A graph intended for others to look must have at least these two properties
Both the question and main comparison should be obvious to you and the viewer
If they are confused, they won’t try to understand your graph
ggplot2 do much of this work for youTry to anticipate the process the audience will go through while looking at your graph
Decide what you want them to remember; everything else is secondary to that
data.frametest_data
## # A tibble: 44 x 4
## round respondent num.correct duration
## <fct> <fct> <dbl> <dbl>
## 1 1 1 10 8.04
## 2 1 2 8 6.95
## 3 1 3 13 7.58
## 4 1 4 9 8.81
## 5 1 5 11 8.33
## 6 1 6 14 9.96
## 7 1 7 6 7.24
## 8 1 8 4 4.26
## 9 1 9 12 10.8
## 10 1 10 7 4.82
## # … with 34 more rows
data.frame) for every layermy_plot <- ggplot(data = test_data, mapping = aes(x = duration,
y = num.correct))
aes() is used to create a list of aesthetic mappings
x refers to the graph’s x-axis, y to the y-axisduration \(\rightarrow\) x-axisnum.correct \(\rightarrow\) y-axismy_plot now represents a ggplot object set to our defaultsdata comes first, mapping comes secondmy_plot <- ggplot(test_data, aes(x = duration, y = num.correct))
print(my_plot)
+ operator to combine ggplot elementsmy_plot + geom_point()
print() call, so the following two lines are equivalent:
my_plot + geom_point()
print(my_plot + geom_point())
my_plot + geom_point()
my_plot + geom_line()
my_plot + geom_point() + geom_line()
identity function, \[f(x)=x\] That is, the data are left unchangedgeom_point and geom_line is identity so these plots show the data as isgeom_histogram is a binning function (called stat_bin)ggplot(test_data, aes(x = duration)) + geom_histogram(binwidth = 2)
Result of applying binning function to duration
## # A tibble: 44 x 4
## round respondent num.correct duration
## <fct> <fct> <dbl> <dbl>
## 1 1 1 10 8.04
## 2 1 2 8 6.95
## 3 1 3 13 7.58
## 4 1 4 9 8.81
## 5 1 5 11 8.33
## 6 1 6 14 9.96
## 7 1 7 6 7.24
## 8 1 8 4 4.26
## 9 1 9 12 10.8
## 10 1 10 7 4.82
## # … with 34 more rows
## # A tibble: 5 x 2
## x y
## <dbl> <dbl>
## 1 4 4
## 2 6 13
## 3 8 20
## 4 10 5
## 5 12 2
| Item | Default stat/geom |
|---|---|
geom_point |
stat_identity (\(f(x)=x\)) |
geom_line |
stat_identity (\(f(x)=x\)) |
geom_histogram |
stat_bin (binning) |
geom_smooth |
stat_smooth (regression) |
stat_smooth |
geom_smooth (line + ribbon) |
stat_bin |
geom_bar (vertical bars) |
stat_identity |
geom_point (dots) |
ggplot(test_data, aes(x = duration)) + stat_bin(binwidth = 1)
ggplot(test_data, aes(x = duration)) + geom_histogram(binwidth = 1)
ggplot(test_data, aes(x = round,
y = duration)) + geom_point()
ggplot(test_data, aes(x = round,
y = duration)) + geom_boxplot()
| Item | Required | Optional |
|---|---|---|
geom_point |
x, y |
alpha, colour, fill, shape, size, stroke |
geom_line |
x, y |
alpha, colour, linetype, size |
geom_pointrange |
x, ymax, ymin |
alpha, colour, linetype, size |
my_plot + geom_point(
mapping = aes(colour = round))
my_plot + geom_point(
colour="red")
identity meaning don’t do anything specialstack or dodgeg <- ggplot(test_data, aes(x = num.correct, fill = round))
g + stat_bin(binwidth = 4,
position = 'stack')
g + stat_bin(binwidth = 4,
position = 'dodge')
Cmd-Enter (Mac) or Control-Enter (Windows)mpg which is included in the ggplot2 packagelibrary(tidyverse)
?mpg
Fuel economy data from 1999 and 2008 for 38 popular models of car
Description:
This dataset contains a subset of the fuel economy data that the
EPA makes available on http://fueleconomy.gov. It contains
only models which had a new release every year between 1999 and
2008 - this was used as a proxy for the popularity of the car.
Usage:
mpg
Format:
A data frame with 234 rows and 11 variables
manufacturer
model model name
displ engine displacement, in litres
year year of manufacture
cyl number of cylinders
trans type of transmission
drv f = front-wheel drive, r = rear wheel drive, 4 = 4wd
cty city miles per gallon
hwy highway miles per gallon
fl fuel type
class "type" of car
mpg
## # A tibble: 234 x 11
## manufacturer model displ year cyl trans drv cty hwy fl class
## <chr> <chr> <dbl> <int> <int> <chr> <chr> <int> <int> <chr> <chr>
## 1 audi a4 1.8 1999 4 auto(l5) f 18 29 p compact
## 2 audi a4 1.8 1999 4 manual(m5) f 21 29 p compact
## 3 audi a4 2 2008 4 manual(m6) f 20 31 p compact
## 4 audi a4 2 2008 4 auto(av) f 21 30 p compact
## 5 audi a4 2.8 1999 6 auto(l5) f 16 26 p compact
## 6 audi a4 2.8 1999 6 manual(m5) f 18 26 p compact
## 7 audi a4 3.1 2008 6 auto(av) f 18 27 p compact
## 8 audi a4 quattro 1.8 1999 4 manual(m5) 4 18 26 p compact
## 9 audi a4 quattro 1.8 1999 4 auto(l5) 4 16 25 p compact
## 10 audi a4 quattro 2 2008 4 manual(m6) 4 20 28 p compact
## # … with 224 more rows
x mapped to ctyy mapped to hwypoint geometryidentity statidentity positionGo to http://jasonmtroos.github.io/rook/ and click on session_2_in_class_work_handout
Do Tasks 1–4
colour, shape, or sizefacetg <- ggplot(mpg, aes(x = displ, y = hwy))
g + geom_point(aes(colour = drv))
g + geom_point() + facet_wrap(~drv)
colour, shape, or size, ggplot2 automatically maps those variables to groupgroup aesthetic controls how collections of items are rendered
geom_line the group aesthetic determines which points will be connected by a continuous linestat_summary the group aesthetic determines which points are summarised by a common statisticv is continuous but you want to use it for grouping, either specificy group = v or transform it into a discrete variable, e.g., colour = factor(v)ggplot(mpg, aes(x = displ, y = hwy,
colour=cyl)) +
geom_point() + geom_smooth()
ggplot(mpg, aes(x = displ, y = hwy,
colour=factor(cyl))) +
geom_point() + geom_smooth()
aes(group=1) when creating a layerggplot(mpg, aes(x = displ, y = hwy, colour = factor(cyl))) +
geom_point() + geom_smooth(aes(group = 1))
ggplot(mpg, aes(x = displ, y = hwy)) + geom_point() +
scale_y_log10(breaks = c(15, 30, 45))
ggplot(mpg, aes(x = displ,
y = hwy,
colour = drv)) +
geom_point() +
labs(x = "Displacement (litres)",
y = "Highway miles per gallon",
colour = "Drive train",
title = "Automobile features")
mpg2 <- mpg %>%
mutate(drv2 = case_when(drv == 'f' ~ 'Front',
drv == '4' ~ '4WD',
drv == 'r' ~ 'Rear'))
ggplot(mpg2, aes(x = displ, y = hwy, colour = drv2)) + geom_point() +
labs(colour = "Drive train")
ggplot(mpg, aes(x = displ, y = hwy)) + geom_point() +
facet_wrap(~ drv, labeller = as_labeller(c('f' = 'Front',
'r' = 'Rear',
'4' = '4WD')))
forcats package to relabel/reorder factors